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Abstract —We address the problem of video streaming 
packets from an Access Point (AP) to multiple clients over 
a shared wireless channel with fading. In such systems, 
each client maintains a buffer of packets from which to 
play the video, and an outage occurs in the streaming 
whenever the buffer is empty. Clients can switch to a lower- 
quality of video packet, or request packet transmission at 
a higher energy level in order to minimize the number of 
outages plus the number of outage periods and the number 
of low quality video packets streamed, while there is an 
average power constraint on the AP. We pose the problem 
of choosing the video quality and transmission power as 
a Constrained Markov Decision Process (CMDP). We show 
that the problem involving N clients decomposes into N 
MDPs, each involving only a single client, and furthermore 
that the optimal policy has a threshold structure, in which 
the decision to choose the video-quality and power-level 
of transmission depends solely on the buffer-level. 
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I. Introduction 

Scheduling packets for video streaming over a shared 
wireless downlink is of increasing attention [18]. Pre¬ 
dominantly, this problem has been addressed with the 
goal of minimizing the average number of outages, 
i.e., time-slots during which a client has no packet to 
play [1], [2], [3], [4],[5], [6], [7], [8], [9]. However the 
models considered in these works do not incorporate the 
communication constraints imposed by the network over 
which the streaming occurs. Typically clients streaming 
video files will share a common wireless channel, which 
again typically has a constraint on the average power. 
The access point (AP) has to choose the power level at 
which to transmit individual packets to each client so as 
to maximize the total Quality of Experience (QoE) expe¬ 
rienced by the clients. The system also has an additional 
degree of freedom in that the AP can transmit lower 
quality packets on occasion, leading to a softer loss of 
video quality than an abrupt outage. Another important 
aspect is that the quality of video streaming experienced 
by a client depends not only on the number of outages, 
but also on the number of “outage periods”, i.e., number 
of interruption periods as well. Thus an outage lasting 
10 time-slots is not the same as 10 outages each lasting 
1 time-slot. The QoE experienced by a client thus has to 
take into account several metrics: the average number 
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Fig. 1. Clients video streaming packets from an Access Point over 
a shared wireless channel. B denotes the buffer-size, while different 
colours denote packets of different video qualities. 

of outages, the average number of outage-periods, and 
the quality of video-packets streamed. In this paper 
we address this overall problem. While we focus here 
on the single last-hop case for ease of exposition and 
brevity, our results can be generalized to multi-hop 
networks as well. In order to provide non-interruptive 
video streaming experience to the clients, the AP has to 
guarantee some sort of service regularity to the clients, 
i.e., it has to ensure that the packet deliveries to the 
clients are not in a bursty fashion. References [19], [20], 
[21], [22], [23] develop a framework to design policies 
which provide services to clients in a regular fashion, 
though not in a video streaming context. 

II. System Description 

Consider a system where a wireless channel is shared 
by N clients for the purpose of streaming video packets. 
It is assumed that the system evolves over discrete time- 
slots, and one time-slot is taken by the access point (AP) 
for attempting one packet transmission. 

Client n maintains a buffer of size Bn packets and 
plays a packet for a duration of Tn time-slots. Qnce 
it has finished playing a video packet, it looks for the 
next packet in the buffer. In case the buffer is empty, 
there is an “outage”, meaning that the video streaming 
is interrupted, and the client has to wait for a packet to 
be delivered to its buffer before it can resume the video 











streaming. 

The wireless channels connecting the clients to the 
AP are assumed to be random. For ease of exposition, 
we will derive the results for the case when the channel 
conditions are fixed. These results carry over to the case 
of fading channels in a straight-forward manner. Later, 
in Section [vnT we will outline the results for the case 
of fading channels. 

There are Qn different video qualities {1,2,..., Qn} of 
packets that can be transmitted for client n, with class 
1 video quality providing the best viewing experience. 
Similarly there are {Ei , i? 2 , ■ ■ •, E^} different power lev¬ 
els at which the packets for client n can be transmitted. 
We let El = 0, i.e., a user may choose to not request 
a packet in a time-slot. The probability that the packet 
for client n is successfully delivered upon a transmission 
attempt, Pn{q,E), depends on the amount of power E 
used in the packet transmission and the quality of video 
packet q that was attempted. We also incorporate an 
average power constraint on the AP. 

The basic problem considered is that of scheduling the 
AP’s packet transmissions to clients so as to maximize 
the combined Quality of Experience (QoE) of the clients. 
The QoE of a single client depends on multiple factors 

1) The average number of outages. 

2) How “often” the video gets interrupted, i.e., the 
number of outage-periods, or the number of time- 
slots in which the transition from “non-outage” to 
outage occurs. 

3) The number of packets of different quality types 
that are streamed. 


for tuning the QoS to account for the relative importance 
placed on each of the objectives. We note that for i > j, 
\,n > ^j,n for all n, since we assumed that the video 
quality of a packet is less if the packet belongs to a 
higher valued class. 

Thus the above problem is a CMDP in which the 
system state at time t is described by the N dimensional 
vector L{t) := / 2 (f),..., ZAr(f)), where i„(f) is the 

amount of play time remaining in the buffer of client n 
at time t. 

The central difficulty which arises is that the car¬ 
dinality of the state-space of the system increases ex¬ 
ponentially with the number of clients N, and thus 
the problem is computationally infeasible as formulated 
above. 

We show that the problem of serving N clients can be 
decomposed into N separate problems each involving 
only a single client. Thus the computational complexity 
of the problem grows linearly in the number of clients. 
Moreover, we show that the optimal policy is easily 
implementable since it has a simple threshold structure. 


IV. The Dual MDP 

The Lagrangian associated with a policy tt for the 
system Q is given by, 
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III. Problem Eormulation 


We denote by 0„(s) the random variable that assumes 
the value 1 if the n-th client faces an outage at time s, 
and 0 otherwise, and by i?„(s) the transmission power 
utilized by the n-th client at time-slot s. Also, let In{q, s) 
be the random variable that takes the value 1 if a packet 
of quality q is delivered to client n in time-slot s. 

The Constrained Markov Decision Process (CMDP) of 
interest is then to choose the quality of video packets 
and transmission power for each client, in order to 
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Note that the term |0„(s) (On(s — 1) — 1) | assumes 
the value 1 if time-slot s is the beginning of an outage- 
period for client n, and is 0 otherwise. It thereby mea¬ 
sures the number of outage periods incurred. The pa¬ 
rameters {\,n}^^i , Ao,n n = 1,2,..., A^ are employed 


+ 
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where Xe is the Lagrangian multiplier associated with 
the average power constraint. The associated Lagrange 
dual is. 


iA(AB) = min£(7r, Ab). ( 3) 

TT 

Next we present a useful bound on the dual, the proof 
of which follows from the super-additivity of lim sup and 
sub-additivity of lim inf operations. 

Lemma 1: 

D{Xe) > min lim inf ^EY^ ( 0„(s) -F XsEnis) 

TT t—^OO t \ 

n s=l \ 

Q„ 

~‘rXo,n\On{s) (On(s — 1) — 1) | -f ^ ) Xq^nln{q, s) 

9=1 

- XeE. (4) 


V. Single Client Problem 

We consider minimizing the bound obtained in 
Lemma Qbserving the bound, we find that we have 
decomposed the original problem 0 into N single¬ 
client problems, i.e., the expression in the r.h.s. of @ is 





the sum of the costs of N clients, in which the cost of a 
single client depends only on the action chosen for it in 
each time-slot. 

The problem for the single client is described as 
follows. We omit the sub-script n in the following dis¬ 
cussion. The channel connecting the client to the AP 
is random. The client maintains a buffer of capacity B 
time-slots of play-time video (this assumption is equiv¬ 
alent to the assumption of maintaining a buffer of B 
packets since a packet is played for T time-slots), and 
in each time-slot, the AP has to choose two quantities, 
which together comprise the control action chosen for 
the client: 

• The video quality q G {1,2,..., Q}. 

• The power E G {Ei,E 2 , ■ ■ ■, En} at which to carry 
out packet transmission. 

The state of the client is thus described by l{t), the play¬ 
time duration of the packets present in the buffer at 
time t. If the client is scheduled a packet transmission 
of quality q at an power E at time t, and the remaining 
playtime at time t, l{t), is less than or equal to B — T + 1, 
then the system state at time f + 1 is {l{t) — 1)+ -I- T 
with a probability P{q, E), while it is {l{t) — 1)+ with a 
probability P{q,E). However if the value of remaining 
playtime l{t) is strictly greater than B — T +1, then the 
system state at time f -I-1 is /(f) — 1 with a probability 1. 

We let 


5(a;) 

Tix) 


Ux-1)+ +T, iix<B-T+l, 
{x — 1, iiB — T+l<x<B, 
(x-l)+. 


(5) 

( 6 ) 


be the transitions associated with the remaining play¬ 
times associated for a successful and failed packet trans¬ 
mission respectively. The control action at time t will 
be denoted u{t) := {q{t),E{t)), where q{t),E{t) are the 
video quality and transmission power level chosen at 
time t. 

The transmissions at power level E incur a cost of 
Xe X E. There is a penalty of 1 unit upon an outage at 
time t. A penalty of \q units is imposed if a packet of 
quality q is delivered to it, while a penalty of Ao units is 
imposed at time t in case there was no outage at time- 
slot t — 1, and an outage occurs in time-slot t, i.e. if a 
new outage-period begins at time t. 

Since the probability distribution of the system state at 
time f -f 1 is completely determined by the system state 
at time t, and the action {q,E) chosen at time t, i.e., 
requested video quality and power level at which trans¬ 
mission occurs, the single client problem is a Markov 
Decision Process (MDP) involving only a finite number 
of actions and states, and is thus solved by a stationary 
Markov policy [12]. 

Denote by 7r„ a policy for the client n. The single client 


problem is to solve. 
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Denote by t:'^{Xe), the optimal policy which solves the 
single client problem. We also let 

14i ( A £;)= minliminf -Ey^ [ 0(s)-I-A£;i?(s) 

TT t—^OO i • ^ \ 
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be the optimal cost, and 14 (A^;, tt) be the cost associated 
with a policy tt. 

VI. Threshold Structure of the Optimal Policy 
FOR THE Single Client Problem 

We will suppress the subscript n in the following 
discussion, and begin with a discussion of the /3 G (0,1) 
discounted infinite horizon cost problem for the single 
client. Let 
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be the minimum /3-discounted infinite horizon cost for 
the system starting in state x at time 0, where x can as¬ 
sume values in the set {0,1,..., B}. The function Vp{x) 
is similarly defined to be the minimum /3-discounted cost 
incurred in s time-slots for the system starting in state 
x, i.e.. 


Vp{x) = minE,j 


y]/ 3 ‘ { 0 {t)P\EE{t) 
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where tt® is a policy for the s horizon /^-discounted 
problem. The quantities 14 ( 3 :), Vg (a;) should not be con¬ 
fused with the quantities 14(As) defined in the previous 
section. We have, 

Va{x) = min lix = 0) -|- XeE 

^ (9.S) 

+ P{q,E) [Xq + (3V^-\S{x)) 

+ (1 - P{q, E)) [l(x = l)Ao + PV^-\P{x)) 

= \{x = 0) -L l(a; = l)Ao + \BV^p-^{P{x)) 

+ min{C'(M) — P(u)D^ (x)}, 


( 10 ) 



where 


C{u) := XEE + P{q,E)Xg, (11) 

is the one-step cost associated with the action u = 
(g, E), and for s = 1,2,.. 

Dl[x) := 1(0. = l)Ao + d {v^-\E{x)) - . 

(12) 

We assume that a lower video quality packet, or a higher 
power packet transmission, leads to an increase in the 
success of packet transmission P{q,E), i.e., an increase 
in cost is associated with a higher transmission success 
probability. 

Definition 1: We say a policy is of threshold-type if it 
satisfies the following for each stage s: 

• Fix any E G {Ei,E 2 , ■ ■ ■ ,En\- If the policy chooses 
the action (g, E) in state x, then it does not choose 
the actions {{q,E) : q < q} for any state 1 < y < a;. 

* Fix any g G {Qi, Q 2 , ■ ■ ■ j Qn}- If the policy chooses 
the action (g, E) in state x, then it does not choose 
the actions {{q,E) : E < E} for any state 1 < y < a;. 

If X, y G {1,2,..., B} are such that x > y, let u^,Uy 
be the actions chosen by a threshold policy tt in states 
X and y. Then it is easily verified that P{ux) < P{uy). 

Next we present a useful lemma that is easily proved. 
In the following, {u, tt) is the policy that follows the 
action u in the first slot, and then follows policy tt, while 
is the cost achieved under the policy tt in s time- 
slots for the system starting in state x. 

Lemma 2: Let ui,U 2 be two actions where P{u 2 ) > 
Piui), or equivalently, Piu 2 ) > Piui). Then, 

P(Mi) {pV^-\S{P{x))) - V(|-I(5(5(x)))} 

+ (1 - P{U 2 )) {l(P(x) = l)Ao + l3Vf\P{P{x))) 

-V;-\P{S{x)))}+Ciu2)-C{u,) 

= P(axi) {/3y;-i(P(5(x))) - l/;-i(5(5(x)))} 

+ (1 - P{U 2 )) [l{E{x) = l)Ao + f5V^-\P{P{x))) 

-V^-\S{P{x)))]+C{u 2 )-C{u^). 

Lemma 3: For s = 1,2,..., the functions D^{x) are 
decreasing in x for x G (1,2,..., P — T + 1}. 

Proof: Within this proof, let tt* be the optimal 
policy for the /3-discounted s time-slots problem, and 
let (M,7r*_^) be the policy for s time-slots which takes 
the action u at the first time-slot, and then follows the 
policy 7r*_^. In order to prove the claim, we will use 
induction on s, the number of time-slots. 

Let us assume that the statement is true for the 
functions D^{x), for all 2 < s. In particular the function, 

l(x = l)Ao + P {^"^-'(^(x)) - V^-\Six))} , (13) 

is decreasing for x G (1,2,..., P — T -F 1}. 


First we will prove the decreasing property for x G 
{2,3,..., P — T -F 1}. Now the assumption ( |13D made 
above, and dlQD , together imply that tt* is of threshold- 
type. 

Fix an X G (1,2,..., P — T} and denote by 
ui,U 2 ,U 3 ,Ui, the optimal actions at stage s for the 
states P(x),P(x),iS(x + l),P(x -F 1) respectively. Note 
that the threshold nature of implies that, 

P{ui) < P{u 2 ),Piu-i) < P{ui) and , 

P(m3) < P(Mi),P(m4) < P{U2). 

This is true because as the value of state decreases in the 
interval (1,2,..., P}, a threshold policy switches to an 
action that has a higher transmission success probability. 
So it follows from Lemma [2] that 


Vf{P{xPl))-Vf{S{xPl)) 

< + 1)) - F/(p(x+1)) 

= Ciu2) - C{U3) 


+ Pc{u3) X P 


V;-\P{Six + 1))) - l{|-'(P(P(x + 1))) 


-F (1 — Pc{u2)) X 

{l(P(x + 1 ) = 1 ) + pV^-\P{P{x + 1 ))) 
-L(|-i(P(P(x + l)))} 


< C{U2) - C(m3) 


+ Pc{uf} X P 


V^-\S{P{x)))-V;-\S{S{x))) 


-F (1 — Pc{u2)) X 


l(P(x) = 1) + pV^-\PiP{x))) - V^-\S{P{x))) 
<V^{Pix))-V^iSix)), 


where the first inequality follows since a sub-optimal 
action in the state P(x + 1 ) increases the cost-to-go for s 
time-slots, the second inequality is a consequence of the 
assumption that the functions Vp~^{E{x)) — V^~^{S{x)) 
are decreasing in x, while the last inequality follows 
from the fact that a sub-optimal action in the state 
iS(x) will increase the cost-to-go for s time-slots. Thus 
we have proved the decreasing property of for 

X G (2,3,..., P — r + 1}, and it remains to show that 
Pf+i(l)>Pf+i(2). 

Once again, let tti, tt 2 , M 3 , Mr be the optimal actions at 
stage s for the states T, 0, T +1,1 respectively. Using the 
same argument as above (i.e., assuming that the actions 
taken in stage s at states T,T+1 are the same, and the 
actions taken in the states 0,1 are the same), it follows 
that 


Ps-i-i(l) ~ Ds+i(2) > 

(1 + Ao - PXo) - {V^iT) - V^iT + 1 )) . 

However, then V^(T) — {T + 1) < 1 + Xq — PXo (for 
s stages, apply the same actions for the system starting 
in state T, as that for a system starting in state T -F 1, 








and note that the two systems couple at a stage t — 1, 
when the latter system hits the state 1 at any stage t; 
the hitting stage is of course random). This gives us, 

Ds+i{l)-Ds+i{2)>Q, 

and thus we conclude that the function Ds+i{x) is 
decreasing for x S {1,2,..., B}. In order to complete 
the proof, we notice that for s = 1, we have. 

Iff (a;) = l(a: = l)Ao, 

and thus the assertion of Lemma is true for s = 1. ■ 

Theorem 1: Consider the single client problem dis¬ 
cussed in Section There is a threshold policy that is 
Blackwell optimal [17], i.e., it is optimal for all values of 
P S (/3,1) for some (3 S (0,1), and is also optimal for the 
Average cost problem. Thus 7r*(A£:) is of threshold-type 
and can be obtained in time via comparing 

the costs of all threshold-t 5 ^e policies. 

Proof: Fix a g and let Ei,Ej,i > j be two power 
levels. Without loss of generality, let ui = {q,Ei),U 2 = 
{q,Ej). Clearly C'(mi) > C{u 2 ) dllj ). In the Bellman 
equation (HI, consider the term depending on u, i.e. 
the term C[u) — P{u)D^{x). For x,y S {l,2,...,il — 
T + 1}, X > y, we have, 

C{uf) - P{ui)D^{x) - {C{U 2 ) - P{u 2 )D^{x)) 

- {C(«i) - Piu,)D^{y) - (C(U2) - P(u2)D^(y))} 

= (P(m)-P(u 2 ))(D^(y)-D^(x)) 

> 0 , 

where the last inequality follows from Lemma Thus it 
follows that if action ui is preferred over action U 2 for 
any state x, then tti will also be preferred over action 1 x 2 
for any state y < x, y G {1,2 ,..., B — T+1}. Finally note 
that it follows from the Bellman equation dlQD and ([^, 
that the optimal action for states x > i? — T -F 1 is to let 
E = 0 (since any packet that is received will be lost due 
to buffer over flow). The proof for variations in power 
levels is similar. Thus it follows from the definition of a 
threshold policy that the optimal policy is of threshold 
type. 

Finally note that the statement regarding Blackwell 
optimality follows from the result in the above para¬ 
graph, and because the state-space is finite. ■ 

VII. Solution of Primal MDP 

We now present the solution of the Primal Problem. 
Lemma 4: D{Xe) = J2n ^"(Ab) - XeE. 

Proof: Let 7r*(XE) '■= <S>Tr*(XE) be the policy ob¬ 
tained by following the policy 7r*(AB) for each client 
n. Then from the definition of dual function, La- 
grangian (HJi, cost associated with a policy tt (HJi and 
Lemma we have 

£(7r, Xe) > D(Xe) > ^ Vn{XE, tt) — Xe x E. (14) 


However since the policy 7r*(AB) is stationary, (all the 
liminf and limsup become lim in the definition of its 
Lagrangian, and associated rewards in the single-client 
problem change to lim), we have that 

E{tt*{Xe),Xe) = ^ 14(Ae) — Xe X E, 

n 

which, along with ( [L4l ) gives us D{Xe) = Y)n ^n{XE) - 

XeE. ■ 

Theorem 2: Consider the Primal MDP Q and its asso¬ 
ciated dual problem defined in (|^. There exists a price 
A^ such that (7r*(A^), A^) is an optimal primal-dual pair 
and thus the policy 7r*(Ag) solves the Primal MDP. 

Proof: We observe that there is a one-to-one cor¬ 
respondence between any stationary randomized policy, 
and the measure it induces on the state-action space, 
and thus the Primal MDP can be posed as a linear 
program [13], [14]. Thus it follows from Slater’s condi¬ 
tion [15] that for the Primal MDP, strong duality holds 
if there exists a policy tt that satisfies the constraints 
limsupt^„„ iEX:„Es^n(s) < E. However the policy 
which never schedules any packets incurs a net power 
expenditure of 0, and thus Slater’s condition is true for 
the Primal MDP HE > 0. The claim of the Theorem 
then follows from Lemma |3j ■ 

We note that the policy 7r*(Ag) is a decentralized policy. 
That is, the decision to choose the video-quality and 
power-level at each time t for client n, i.e., {qn{t), En{t)) 
can be taken by client n itself, and doesn’t require the 
AP to co-ordinate the clients. Thus a client n need not 
know the state values of other clients, (t) for m n, 
nor does the AP need to know the values of ln{t). Thus 
the policy is easy to implement. 

A. Obtaining A^ iteratively in a decentralized fashion 

We note that in order to implement the optimal policy 
7r*(A^) as in Theorem]^ we need to find the optimal 
value of the price A^j. We iterate on the price Xe using 
the sub-gradient method [16], and since the problem 
is concave, the prices converge to the optimal value 
A^. Moreover the iterations involving price-updates are 
decentralized, i.e., the clients need only the knowledge 
of the current price Xe for the iteration. 

Now since D{Xe) = C{tt*{Xe),Xe), we have, 

on 

^ =i;-E,,*(Ag)^T(n,7r*(AE)), (15) 

oXy ^ 

where E,r*(AE) expected cost in¬ 

curred on the power over all the users. This is the total 
“congestion” at the AP. The iteration for Xe is, 

A^+^ = Xe- afcfffc, 

where dk is the sub-gradient evaluated in dlS] ). 


VIII. Fading Channels 

The results in the previous sections can be extended 
in a straight forward manner to the case of fading 
channels. Let the channel conditions for client n be 
described by a Markov process evolving on finitely many 
states {1,2, ... ,Cn} having a transition matrix n„. The 
state of client n is described by the vector Xn{t) := 
{ln{t),Cn{t)), where /„(f) is the play-time duration of the 
packets present in the buffer at time t, and Cn{t) is the 
channel condition at time t. If the client n is scheduled 
a packet transmission of quality q at an power E at 
time t, then the system state at time f -F 1 is {S{l{t)),c) 
with a probability Pn^c„it)i<lT E)Il{cn{t),c), while it is 
{E{l{t)),c) with a probability F;)n(c„(f),c). 

However now the cost associated to an action u also 
depends on the channel condition, i.e., 

Cciu) := XEE + P,{l,E)Xg, (16) 

and a threshold policy will have a threshold structure 
for each value of channel condition (as defined in Sec¬ 
tion 0. 

IX. Concluding Remarks 

We have formulated the problem of dynamically 
choosing the qualities and power levels for packet trans¬ 
missions across unreliable wireless so as to maximize the 
Quality of Experience of video streaming channels as an 
MDR Using Lagrangian techniques, we have shown that 
the problem exhibits a decentralized solution, wherein 
the clients can djmamically decide these quantities on 
their own using their local information, i.e., the channel 
state and the amount of playtime remaining in their 
buffers. Thus the optimal policy can be obtained in time 
linear in the number of users. 

Furthermore we have shown that the optimal policy 
has a threshold structure, thus further reducing the 
complexity of searching for the optimal policy. Moreover 
due to the threshold nature of the policy, it is easy to 
implement. 


[7] T. Hossfeld, S. Egger, R. Schatz, M. Fiedler, K. Masuch, and 
C. Lorentzen, “Initial delay vs. interruptions: between the devil 
and the deep blue sea,? in Proc. of QoMEX, 2012. 

[8] J. De Vriendt, D. De Vleeschauwer, and D. Robinson, “Model for 
estimating qoe of video delivered using http adaptive stream¬ 
ing,? in Proeedings of IFIP/IEEE International Symposium on 
Integrated Network Management (IM 2013), 2013. 

[9] Y. Xu, S. Elayoubi, E. Altman and R. El-Azouzi. “Impact of flow- 
level dynamics on qoe of video streaming in wireless networks”. 
In Proceedings of IEEE INFOCOM, 2013 (April 2013), pp. 2715- 
2723. 

[10] Eitan Altman, Constrained Markov Decision Processes. Taylor & 
Francis, 1999. 

[11] Frederick J. Beutler and Keith W. Ross, “Optimal policies for con¬ 
trolled Markov chains with a constraint,’’Journal of Mathematical 
Analysis and Applications, Volume 112, Issue 1, 15 November 
1985, Pages 236-252. 

[12] Martin L. Puterman, Markov Decision Processes: Discrete Stochas¬ 
tic Dynamic Programming. John Wiley & Sons, Inc., 1994. 

[13] Alan S. Manne, Linear programming and sequential decisions in 
Management Science, 259 -267, 1960 

[14] Vivek S. Borkar, Control of Markov Chains with Long-Run Av¬ 
erage Cost Criterion in Stochastic Differential Systems, Stochastic 
Control Theory and Applications, 57-77, 1988 

[15] D. P. Bertsekas, Nonlinear Programming, Athena Scientific, 1999 

[16] N.Z. Shor, Krzysztof C. Kiwiel, and Andrzej Ruszcaynsk, Min¬ 
imization Methods for Non-differentiable Functions, Springer- 
Verlag New York, Inc., 1985 

[17] David Blackwell, “Discrete dynamic programming”, Annah of 
Mathematical Statistics vol. 33, 719-726, 1962. 

[18] Cisco Visual Networking Index (VNI): http://www. 
cisco.com/c/en/us/solutions/collateral/ 
service-provider/visual-networking-index-vni/ 
white_paper_cll-520862.pdf 

[19] Rahul Singh, I-Hong Hou and P.R. Kumar, Fluctuation analysis 
of debt based policies for wireless networks with hard delay 
constraints, IEEE INFOCOM, 2014 , pp 2400-2408. 

[20] Rahul Singh, I-Hong Hou and P.R. Kumar, Pathwise performance 
of debt based policies for wireless networks with hard delay con¬ 
straints, IEEE 52nd Annual Conference on Decision and Control 
(CDC), 2013, pp 7838-7843. 

[21] Rahul Singh, Xueying Guo and P.R. Kumar, Index Policies for 
Optimal Mean-Variance Trade-Off of Inter-delivery Times in Real- 
Time Sensor Networks, IEEE INFOCOM 2015. 

[22] Xueying Guo, Rahul Singh, P.R. Kumar and Niu, Zhisheng, A 
High Reliability Asymptotic Approach for Packet Inter-Delivery 
Time Optimization in Cyber-Physical Systems, ACM MobiHoc 
2015, pp 197-206. 

[23] Rahul Singh and Alexander Stolyar, MaxWeight Scheduling: 
Asymptotic Behavior of Unsealed Queue-Differentials in Heavy 
Traffic, ACM SIGMETRICS, 2015. 


References 

[1] A. ParandehGheibi, M. Medard, A. E. Ozdaglar, and S. Shakkottai, 
“Avoiding interruptions - a QoE reliability function for streaming 
media applications,” IEEE Journal on Selected Areas in Commu¬ 
nication, vol. 29, no. 5, pp. 1064—1074, 2011. 

[2] G. Liang, “Effect of delay and buffering on jitter-free streaming 
over random vbr channels,” IEEE Transactions on Multimedia vol. 
10, no. 6, pp. 1128-1141. 

[3] Ankit Singh Rawat and Emina Soljanin, Dynamic Control of 
Video Quality for AVS, 2014 IEEE International Symposium on 
Information Theory. 

[4] Y. XU, E. ALTMAN, R. EL-AZOUZI, M. HADDAD, S. ELAYOUBI and 
T. JIMENEZ, “Analysis of buffer starvation with application to ob¬ 
jective qoe optimization of streaming services,” IEEE Transactions 
on Multimedia, vol. 16, no. 3 (April 2014),pp 813-827. 

[5] G. Tian and Y. Liu, “Towards agile and smooth video adaptation 
in dynamic http streaming, in Proceedings of CoNEXT, 2012, pp. 
109-120. 

[6] L. De Cicco, S. Mascolo, and V. Palmisano, “Feedback control for 
adaptive live video streaming, in Proceedings of MMSys, 2011, 
pp. 145-156. 


